AITopics | Carlsbad

Apart from the already-large model parameters, the key/value (KV) cache that holds information about previous tokens in a sequence can grow to be even larger than the model itself. This problem is exacerbated in one of the current LLM serving frameworks which reserves the maximum sequence length of memory for the KV cache to guarantee generating a complete sequence as they do not know the output sequence length. This restricts us to use a smaller batch size leading to lower GPU utilization and above all, lower throughput. We argue that designing a system with a priori knowledge of the output sequence can mitigate this problem.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Diego County > Carlsbad (0.04)
Asia > Taiwan (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Asia > China (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

a9b7ba70783b617e9998dc4dd82eb3c5-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 18:07:12 GMT

artificial intelligence, machine learning, transformation space, (18 more...)

Neural Information Processing Systems

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
North America > United States > California > Santa Clara County > Santa Clara (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(11 more...)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

1f14ac136d55c34a18a04ce3db083599-Paper-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 20:14:10 GMT

Augmenting tactic-based interactive theorem provers with neural guidance has been the focus of increased attention in recent years [1, 2, 3, 4, 5]. The dominant approach uses imitation learning on corpora of formalized mathematics. However, despite recent efforts involving self-supervised pre-training [5] or data-augmentation [6], this approach is limited by the conspicuous scarcity of human-producedtrainingdata.

artificial intelligence, logic & formal reasoning, machine learning, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.05)
Europe > Austria (0.05)
(8 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.49)

Add feedback

PlanetServe: A Decentralized, Scalable, and Privacy-Preserving Overlay for Democratizing Large Language Model Serving

Fang, Fei, Hua, Yifan, Wang, Shengze, Zhou, Ruilin, Liu, Yi, Qian, Chen, Zhang, Xiaoxue

arXiv.org Artificial IntelligenceDec-12-2025

While significant progress has been made in research and development on open-source and cost-efficient large-language models (LLMs), serving scalability remains a critical challenge, particularly for small organizations and individuals seeking to deploy and test their LLM innovations. Inspired by peer-to-peer networks that leverage decentralized overlay nodes to increase throughput and availability, we propose GenTorrent, an LLM serving overlay that harnesses computing resources from decentralized contributors. We identify four key research problems inherent to enabling such a decentralized infrastructure: 1) overlay network organization; 2) LLM communication privacy; 3) overlay forwarding for resource efficiency; and 4) verification of serving quality. This work presents the first systematic study of these fundamental problems in the context of decentralized LLM serving. Evaluation results from a prototype implemented on a set of decentralized nodes demonstrate that GenTorrent achieves a latency reduction of over 50% compared to the baseline design without overlay forwarding. Furthermore, the security features introduce minimal overhead to serving latency and throughput. We believe this work pioneers a new direction for democratizing and scaling future AI serving capabilities.

large language model, machine learning, node, (19 more...)

arXiv.org Artificial Intelligence

2504.20101

Country:

South America (0.04)
North America > United States > Nevada > Washoe County > Reno (0.04)
North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

925869234d3aa2a3aad5f05b643974aa-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 09:50:14 GMT

computational linguistic, language model, rtd, (16 more...)

Neural Information Processing Systems

Country:

Europe > Austria > Vienna (0.14)
North America > Dominican Republic (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)
(10 more...)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Communications (0.94)

Add feedback

Rethinking Memory and Communication Costs for Efficient Data Parallel Training of Large Language Models

Neural Information Processing SystemsOct-9-2025, 22:42:30 GMT

Recently, various strategies for distributed training of large language models (LLMs) have been proposed.

communication, gradient, opération, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > California > San Diego County > Carlsbad (0.04)
North America > United States > Virginia (0.04)
(12 more...)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.67)

Industry:

Law (0.67)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Increasing GPU Utilization during Generative Inference for Higher Throughput

Neural Information Processing SystemsOct-8-2025, 11:37:11 GMT

Apart from the already-large model parameters, the key/value (KV) cache that holds information about previous tokens in a sequence can grow to be even larger than the model itself. This problem is exacerbated in one of the current LLM serving frameworks which reserves the maximum sequence length of memory for the KV cache to guarantee generating a complete sequence as they do not know the output sequence length. This restricts us to use a smaller batch size leading to lower GPU utilization and above all, lower throughput. We argue that designing a system with a priori knowledge of the output sequence can mitigate this problem.

sequence, sequence length, throughput, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Diego County > Carlsbad (0.04)
Asia > Taiwan (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Asia > China (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

LLM Enhancement with Domain Expert Mental Model to Reduce LLM Hallucination with Causal Prompt Engineering

Kovalerchuk, Boris, Fegley, Brent D.

arXiv.org Artificial IntelligenceSep-16-2025

Difficult decision-making problems abound in various disciplines and domains. The proliferation of generative techniques, especially large language models (LLMs), has excited interest in using them for decision support. However, LLMs cannot yet resolve missingness in their training data, leading to hallucinations. Retrieval-Augmented Generation (RAG) enhances LLMs by incorporating external information retrieval, reducing hallucinations and improving accuracy. Yet, RAG and related methods are only partial solutions, as they may lack access to all necessary sources or key missing information. Even everyday issues often challenge LLMs' abilities. Submitting longer prompts with context and examples is one approach to address knowledge gaps, but designing effective prompts is non-trivial and may not capture complex mental models of domain experts. For tasks with missing critical information, LLMs are insufficient, as are many existing systems poorly represented in available documents. This paper explores how LLMs can make decision-making more efficient, using a running example of evaluating whether to respond to a call for proposals. We propose a technology based on optimized human-machine dialogue and monotone Boolean and k-valued functions to discover a computationally tractable personal expert mental model (EMM) of decision-making. Our EMM algorithm for LLM prompt engineering has four steps: (1) factor identification, (2) hierarchical structuring of factors, (3) generating a generalized expert mental model specification, and (4) generating a detailed generalized expert mental model from that specification.

large language model, machine learning, mental model, (21 more...)

arXiv.org Artificial Intelligence

2509.10818

Country: North America > United States > California > San Diego County > Carlsbad (0.04)

Genre:

Research Report (0.64)
Workflow (0.49)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Diagnostic Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Filters

Collaborating Authors

Carlsbad

Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models

Rethinking Memory and Communication Costs for Efficient Data Parallel Training of Large Language Models

Increasing GPU Utilization during Generative Inference for Higher Throughput

a9b7ba70783b617e9998dc4dd82eb3c5-Paper.pdf

1f14ac136d55c34a18a04ce3db083599-Paper-Conference.pdf

PlanetServe: A Decentralized, Scalable, and Privacy-Preserving Overlay for Democratizing Large Language Model Serving

925869234d3aa2a3aad5f05b643974aa-Paper-Conference.pdf

Rethinking Memory and Communication Costs for Efficient Data Parallel Training of Large Language Models

Increasing GPU Utilization during Generative Inference for Higher Throughput

LLM Enhancement with Domain Expert Mental Model to Reduce LLM Hallucination with Causal Prompt Engineering